Standards in Genomic Sciences (2014) 9:948-955 



DOI:10.4056/sigs.4277954 



Non-contiguous finished genome sequence of 
Corynebacterium timonense type strain 5401 744^ 

Veronique Roux', Catherine Robert ^ and Didier Raoult\ 

^Aix Marseille Universite, Faculte de medecine, Aix-Marseille Universite, France 

*Correspondence: Veronique Roux (veronique.roux@univ-amu.fr) 
Keywords: Corynebacterium timonense, Actinobacteria 



Corynebacterium timonense strain 5401744^ is a member of the genus Corynebacterium, 
which contains Gram-positive bacteria with a high Gh-C content. It was isolated from the 
blood of a patient with endocarditis. In this work, we describe a set of features of this organ- 
ism, together with the complete genome sequence and annotation. The 2,553,575 bp long 
genome contains 2,401 protein-coding genes and 55 RNA genes, including between 5 and 6 
rRNA operons. 



Introduction 

Corynebacterium timonense strain 5401744t[CSUR 
P20T=CIP 109424T= CCUG 53856T) is the type 
strain of C. timonense. This bacterium was isolated 
from the blood of a patient with endocarditis [1]. 
The genus Corynebacterium is comprised of Gram- 
positive facultatively anaerobic bacteria with a high 
G+C content. It currently contains over 80 mem- 
bers [2]. The combination of chemotaxonomic 
markers [3,4] and a molecular approach based on 
16S rRNA and rpoB gene sequence analyses im- 
proved the identification of members of this genus 
[5-7]. Corynebacterium species have been isolated 
from human clinical sources [8-14], animal sources 
[15-18] and the environment [19-21]. 

Here we present a summary classification and a set 
of features for C. timonense, together with the de- 
scription of the non-contiguous finished genomic 
sequencing and annotation. 

Classification and features 

The 16S rRNA gene sequence of C. timonense 
strain 5401744^ was compared with sequences 
deposited in the Genbank database, confirming 
the initial taxonomic classification. Figure 1 shows 
the phylogenetic neighborhood of C. timonense in 
a 16S rRNA based tree. 



The bacterium was first characterized in July 
2005, in a 56-year-old man with a history of infec- 
tive endocarditis. It was isolated from blood cul- 
ture in the Timone Hospital microbiology labora- 
tory. 

Cells are rod-shaped that occur as single cells, in 
pairs or in small clusters, 0.6-2.1 ^m long and 0.4- 
0.6 |im wide. Optimal growth of strain 5401744T 
occurs at 37°C with range for growth between 25 
and 50 °C. After 24 hours growth on blood sheep 
agar at 37°C, surface colonies are circular, yellow 
colored, glistening and up to 1-2 mm in diameter. 
Carbon sources utiUzed include glucose and ri- 
bose. Activities of catalase, pyrazinamidase, alka- 
line phosphatase, esterase (C4), esterase hpase 
[C8), Hpase (C14), leucine arylamidase and acid 
phosphatase are detected. The fatty acid profile is 
characterized by the predominance of C18:l (jo9c 
(36.4%), C17:l a)9c (27.1%), C16:0 (10.9%) and 
C18:0 (6.1%). Tuberculostearic acid is not detect- 
ed. The size and ultrastructure of cells were de- 
termined by negative staining transmission elec- 
tron microscopy. The rods were 0.6-2.1 |im long 
and 0.4-0.6 |im wide (Figure 2). Table 1 presents 
the classification and features of the organism. 
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Figure 1. Part of phylogenetic tree highlighting the position of Corynebacterium timonense strain 5401744^ 
relative to other type strains within the Corynebacterium genus by comparison of 1 6S rRNA gene sequences. 
GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTALX, and 
phylogenetic inferences obtained using the neighbor joining method within the MEGA 5 software [22]. 

Numbers at the nodes are percentages of bootstrap values (S 50%) obtained by repeating the analysis 1,000 
times to generate a majority consensus tree. Solibacillus silvestris was used as outgroup. The scale bar repre- 
sents 0.005 nucleotide change per nucleotide position. 




Figure 2. Transmission electron micrograph of C. 
timonense strain 5401 744\ using a Morgani 268D 
(Philips) at an operating voltage of 60kV. The scale bar 
represents 500 nm. 
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Table 1. Classification and general features of Corynebacterium timonense strain 5501 744^ 



MIGS ID 


Property 


Term 


Evidence code^ 






Domain Bacteria 


TAS [23] 






Phylum Actinobacteria 


TAS [24] 






Class Actinobacteria 


TAS [25] 




Current classification 


Order Actinomycetales 


TAS [25-28] 






Family Corynebacteriaceae 


TAS [25,26,28,29] 






Genus Corynebacterium 


TAS [26,30,31] 






Species Corynebacterium timonense 


TAS [1[ 






Strain 5401 744^ 


TAS [1] 




Gram stain 


Positive 


IDA 




Cell shape 


Pleomorphic forms 


IDA 




Motility 


Non-motile 


IDA 




Sporulation 


Non-sporulating 


IDA 




Temperature range 


Mesophile 


IDA 




Optimum temperature 


37°C 


IDA 


MIGS-6.3 


Salinity 


Not reported 


IDA 


MIGS-22 


Oxygen requirement 


Aerobic and facultatively anaerobic 


IDA 




Carbon source 


Glucose, ribose 


NAS 




Energy source 


Chemoorganotroph 


NAS 


MlGS-6 


Habitat 


Host 


IDA 


MlGS-15 


Biotic relationship 


Free living 


IDA 


MIGS-14 


Pathogenicity 


Unknow^n 


NAS 




Biosafety level 


2 






Isolation 


Human blood sample 





"Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report ex- 
ists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated 
sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence 
codes are from the Gene Ontology project [32]. If the evidence is IDA, then the property was directly ob- 



served for a live isolate by one of the authors or an expert mentioned in the acknowledgements. 



Genome sequencing and annotation 
Genome project history 

The organism was selected for sequencing on tlie 
basis of its pliylogenetic position and 16S rRNA 
similarity to other members of the genus 
Corynebacterium, and is part of a study of the new 
species characterized in our laboratory. A sum- 



mary of the project information is shown in Table 
2. The EMBL accession number is CAJPOIOOOOOO 
and consists of 58 contigs [> 500 bp) and 10 scaf- 
folds (> 4,375 bp). Table 2 shows the project in- 
formation and its association with IVIIGS version 
2.0 compHance. 



Table 2. Proje 


ct information 




MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


High-quality draft 


MIGS-28 


Libraries used 


One paired end 3-kb library and one Shotgun library 


MlGS-29 


Sequencing platforms 


454 GS FLX Titanium 


MlGS-31.2 


Fold coverage 


37.2X 


MlGS-30 


Assemblers 


Newbler version 2.5.3 


MlGS-32 


Gene calling method 


Prodigal 




EMBL ID 


CAJPOIOOOOOO 




EMBL Date of Release 


February, 2, 201 3 




Project relevance 


Study of new species isolated in the URMITE 
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Growth conditions and DNA isolation 

C. timonense strain 5401744T, was grown aerobi- 
cally on 5% sheep blood-enriched Columbia agar 
at 37°C. Five petri dishes were spread and colo- 
nies scraped and resuspended in 3 ml of TE buffer. 
Three hundred |il of 10% SDS and 150 |il of pro- 
teinase K were then added and incubation was 
performed over-night at 56°C. The DNA was then 
extracted using the phenol/chloroform method. 
The yield and the concentration were measured 
using the Quant-it Picogreen kit [Invitrogen) on 
the Genios Tecan fluorometer at 182 ng/ ^il. 

Genome sequencing and assembly 

Shotgun and 3-kb paired-end sequencing strate- 
gies were performed. The shotgun library was 
constructed with 500 ng of DNA with the GS Rapid 
library Prep kit (Roche). For the paired-end se- 
quencing, 5 |ig of DNA was mechanically frag- 
mented on a Hydroshear device (Digilab) with an 
enrichment size at 3-4 kb. The DNA fragmentation 
was visualized using the 2100 Bio Analyzer [Ag- 
ilent) on a DNA labchip 7500 with an optimal size 
of 3.5 kb. The library was constructed according to 
the 454 GS FLX Titanium paired-end protocol. Cir- 
cularization and nebulization were performed and 
generated a pattern with an optimal size of 501 
bp. After PGR amplification through 15 cycles fol- 
lowed by double size selection, the single stranded 
paired-end library was then quantified using the 
Genios fluorometer (Tecan) at 2,540 pg/^L. The 
library concentration equivalence was calculated 
as 9.30E+09 molecules/^L. The library was stored 
at -20°C until further use. 

The shotgun and paired-end libraries were clonal- 
ly-amplified with 2 cpb and 1 cpb in 3 SV-emPCR 
reactions with the GS Titanium SV emPCR Kit [Lib- 
L) v2 [Roche). The yields of the emPCR were 
11.5% and 7.92%, respectively, in the 5 to 20% 
range from the Roche procedure. Approximately 
790,000 beads for the shotgun apphcation and for 
the 3kb paired end were loaded on the GS Titani- 
um PicoTiterPlate PTP Kit 70x75 and sequenced 
with the GS FLX Titanium Sequencing Kit XLR70 
[Roche). The run was performed overnight and 
then analyzed on the cluster through the 
gsRunBrowser and Newbler assembler [Roche). A 
total of 252,118 passed filter wells were obtained 
and generated 37.19 Mb with a length average of 



366.5 bp. The passed filter sequences were as- 
sembled using Newbler with 90% identity and 40 
bp as overlap. The final assembly identified 10 
scaffolds and 46 large contigs [>1,500 bp). 

Genome annotation 

Open Reading Frames [ORFs) were predicted us- 
ing Prodigal [33] with default parameters but the 
predicted ORFs were excluded if they spanned a 
sequencing GAP region. The predicted bacterial 
protein sequences were searched against the 
GenBank database [34] and the Clusters of Orthol- 
ogous Groups [COG) database [35] using BLASTP. 
The tRNAscan-SE tool [36] was used to find tRNA 
genes, whereas ribosomal RNAs were found by us- 
ing RNAmmer [37]. 

Transmembrane domains and signal peptides 
were predicted using TMHMM [38] and SignalP 
[39], respectively. ORFans were identified if their 
BLASTp f-value was lower than le-03 for align- 
ment length greater than 80 amino acids. If align- 
ment lengths were smaller than 80 amino acids, 
we used an £'-value of le-05. Such parameter 
thresholds have been used in previous works to 
define ORFans. 

To estimate the mean level of nucleotide sequence 
similarity at the genome level between C. 
timonense and the Corynebacterium genomes 
available to date, we compared the ORFs only us- 
ing comparison sequence based in the server 
RAST [40] at a query coverage of >60% and a min- 
imum nucleotide length of 100 bp. 

Genome properties 

The genome is 2,553,575 bp long with a 66.85% 
GC content [Table 3, Figure 3). Of the 2,456 pre- 
dicted genes, 2,401 were protein-coding genes, 
and 55 were RNAs. A total of 1,779 genes 
[74.09%) were assigned a putative function, and 
116 genes were identified as ORFans [4.83%). The 
remaining genes were annotated as hypothetical 
proteins [369 genes [15.37%)). The remaining 
genes were annotated as either hypothetical pro- 
teins or proteins of unknown function. The distri- 
bution of genes into COGs functional categories is 
presented in Table 4. The properties and the sta- 
tistics of the genome are summarized in Tables 3 
and 4. 
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Figure 3. Graphical circular map of Corynebacterium timonense genome. From outside to the cen- 
ter: Contigs (red / grey), COG category of genes on the forward strand (three circles), genes on for- 
ward strand (blue circle), genes on the reverse strand (red circle), COG category on the reverse 
strand (three circles), GC content. 



Table 3. Nucleotide content and gene 


count levels of the genome 


Attribute 


Value 


% of total" 


Genome size (bp) 


2,553,575 


100 


DNA coding region (bp) 


2,289,384 


89.65 


DNA Gh-C content (bp) 


1,707,056 


66.85 


Total genes 


2,456 


100 


RNA genes 


55 


2.24 


Protein-coding genes 


2,401 


97.76 


Genes with function prediction 


1,779 


74.09 


Genes assigned to COGs 


1,753 


73.01 


Genes with peptide signals 


353 


14.7 


Genes with transmembrane helices 


550 


22.91 



"The total is based on either the size of the genome in base pairs or 
the total number of protein coding genes in the annotated genome 
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Table 4. Number of genes associated with the 25 general COG functional categories 



Code Value %of totaP Description 



J 


148 


6.1 6 


Translation 


A 


1 


0.04 


RNA processing and modification 


K 


136 


5.66 


Transcription 


L 


1 79 


7.46 


Replication, recombination and repair 


B 


0 


0 


Chromatin structure and dynamics 


D 


1 7 


0.71 


Cell cycle control, mitosis and meiosis 


Y 


0 


0 


Nuclear structure 


V 


45 


1 O "7 
1 .8/ 


Defense mechanisms 


T 


62 


2.58 


Signal transduction mechanisms 


M 


89 


3.71 


Cell wall/membrane biogenesis 


N 


2 


0.08 


Cell motility 


Z 


0 


0 


Cytoskeleton 


W 


V 


0 


Extracellular structures 


u 


27 


1 .12 


Intracellular trarricking and secretion 


o 


6L) 


2.50 


Posttranslational modification, protein turnover, chaperones 


c 


9/ 


A r\ A 
4.04 


Energy production and conversion 


G 


1 z 1 


5.04 


Carbohydrate transport and metabolism 


E 


205 


8.54 


A • • 1 i i 1 ill" 

Ammo acid transport and metabolism 


F 


r r 
65 


1 "71 
Z./ 1 


Nucleotide transport and metabolism 


H 


100 


4.1 6 


Coenzyme transport and metabolism 


1 


78 


3.25 


1 • • 1 J. 1 1 ill" 

Lipid transport and metabolism 


P 


176 


7.33 


Inorganic ion transport and metabolism 


Q 


46 


1.92 


Secondary metabolites biosynthesis, transport and catabolism 


R 


233 


9.7 


General function prediction only 


S 


137 


5.71 


Function unknown 


X 


648 


26.99 


Not in COGs 



*The total is based on the total number of protein coding genes in the annotated genome. 



Comparison with other Corynebacterium 
genomes 

To date, 13 genomes of species belonging to the 
genus Corynebacterium have been sequenced. The 
size of the whole genomes was between 2.32 Mb 
and 3.43 Mb [Table 5). The gene number was cor- 
related with the genome size and was between 
2,187 and 3,131. The G+C content of the genome 
was less than 60% for C. diphtheriae, C. 
glutamicum, C. kroppenstedtii, C. pseudotuber- 



culosis, C. resistens and C. ulcerans but was more 
than 60% for C. aurimucosum, C. efficiens, C. 
genitalium, C. halotolerans, C. jeikeium, C. 
timonense, C. urealyticum and C. variabile. C. 
timonense shared a mean sequence similarity of 
72.05% (60-99.01%), 72.15% (60.09-97.54%), 
74.63% (60-98.37%), 71.83% (60-98.85%), 
72.34% (60-98.02%) and 71.70% (60-97.03%) 
with C. diphtheriae, C. efficiens, C. genitalium, C. 
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glutamicum, C.jeikeium and C. urealyticum, respec- tively. 



Table 5. Comparison of C. timonense characteristics with Corynebacterium whole genome characteristics. 
Species Genome size (Mb) G+C % Number of predicted genes 



C. arimucosum 


2.82 


60.5 


2,630 


L.aipntheriae 


lAo 


53.5 


2,392 


C. efficiens 


3.22 


62.9 


3,064 


C. genitalium 


2.35 


62.7 


2,290 


C. glutamicum 


3.31 


53.9 


3,122 


C. Iialotolerans 


3.22 


68.3 


2,930 


L. jeil<eium 


2.48 


61 .4 


2,1 81 


C. l<roppenstedtii 


2.45 


57.5 


2,083 


C. pseudotuberculosis 


2.32 


52.2 


2,187 


C. resistens 


2.60 


57.1 


2,230 


C. timonense 


2.55 


66.7 


2,456 


C. ulcerans 


2.56 


53.4 


2,355 


C. urealyticum 


2.36 


64.2 


2,045 


C. variabile 


3.43 


67.1 


3,131 



Prophage genome properties 

Prophage Finder [41] and PHAST [42] were used 
to identify potential proviruses in C. timonense 
strain 5401744^ genome. The bacteria contains at 
least one genetic element of around 40.3 kb [with 
a GC content of 64.9%), we named CTl, on contigs 
6-7. A total of 53 open reading frames [ORFs) 
were recovered from CTl, that were longer than 
55 amino acids and most of them (44) encode pro- 
teins sharing a high identity with proteins found 



in Actinomycetales order viruses. The preliminary 
annotation of CTl was performed and the majori- 
ty of the putative genes [41) encode hypothetical 
proteins. The ORFs with an attributed function 
(12) encode proteins involved in DNA packaging, 
cell lysis, tail structural components and assembly, 
head structural components and assembly, 
lysogeny control, DNA replication, recombination 
and modification. 47 of the ORFs are located on 
one strand and 6 on the opposite strand. 
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